Principal component analysis of incomplete data – A simple solution to an old problem

نویسندگان

چکیده

A long-standing problem in biological data analysis is the unintentional absence of values for some observations or variables, preventing use standard multivariate exploratory methods, such as principal component (PCA). Solutions include deleting parts by which information lost, imputation, always arbitrary, and restriction to either variables observations, thereby losing advantages biplot diagrams. We describe a minor modification eigenanalysis-based PCA correlations covariances are calculated using different numbers each pair resulting eigenvalues eigenvectors used calculate scores that missing skipped. This procedure avoids artificial exhausts all from allows preparation biplots simultaneous display ordination observations. The modified PCA, called InDaPCA (PCA Incomplete Data) demonstrated on actual examples: leaf functional traits plants, invertebrates, cranial morphometry crocodiles fish hybridization – with biologically meaningful results. Our study suggests it not percentage entries matrix matters; success mostly affected minimum number available comparing given variables. In present study, interpretation results space first two components was hindered, however. • algorithm accommodate incomplete data. method produces simultaneously. Information maximally exhausted, while imputation required. Variables logically impossible certain allowed. rather than unknown matters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A simple approach to guide factor retention decisions when applying principal component analysis to biomechanical data.

The use of principal component analysis (PCA) as a multivariate statistical approach to reduce complex biomechanical data-sets is growing. With its increased application in biomechanics, there has been a concurrent divergence in the use of criteria to determine how much the data is reduced (i.e. how many principal factors are retained). This short communication presents power equations to suppo...

متن کامل

An application of principal component analysis and logistic regression to facilitate production scheduling decision support system: an automotive industry case

Production planning and control (PPC) systems have to deal with rising complexity and dynamics. The complexity of planning tasks is due to some existing multiple variables and dynamic factors derived from uncertainties surrounding the PPC. Although literatures on exact scheduling algorithms, simulation approaches, and heuristic methods are extensive in production planning, they seem to be ineff...

متن کامل

solution of security constrained unit commitment problem by a new multi-objective optimization method

چکیده-پخش بار بهینه به عنوان یکی از ابزار زیر بنایی برای تحلیل سیستم های قدرت پیچیده ،برای مدت طولانی مورد بررسی قرار گرفته است.پخش بار بهینه توابع هدف یک سیستم قدرت از جمله تابع هزینه سوخت ،آلودگی ،تلفات را بهینه می کند،و هم زمان قیود سیستم قدرت را نیز برآورده می کند.در کلی ترین حالتopf یک مساله بهینه سازی غیر خطی ،غیر محدب،مقیاس بزرگ،و ایستا می باشد که می تواند شامل متغیرهای کنترلی پیوسته و گ...

A Data-Adaptive Principal Component Analysis

This paper studies a data-adaptive principal component analysis (PCA) that does not require prior information of data distribution. The ordinary PCA is useful for dimension reduction and for identifying important features of data that are consist of a large number of interrelated variables. However, it is stringent to the Gaussian assumption of the data, and therefore may not be efficient for a...

متن کامل

Feature Dimension Reduction of Multisensor Data Fusion using Principal Component Fuzzy Analysis

These days, the most important areas of research in many different applications, with different tools, are focused on how to get awareness. One of the serious applications is the awareness of the behavior and activities of patients. The importance is due to the need of ubiquitous medical care for individuals. That the doctor knows the patient's physical condition, sometimes is very important. O...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Ecological Informatics

سال: 2021

ISSN: ['1878-0512', '1574-9541']

DOI: https://doi.org/10.1016/j.ecoinf.2021.101235